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Abstract 

Background:  Clostridium  botulinum  is  a  taxonomic  designation  for  at  least  four  diverse  species  that  are 
defined  by  the  expression  of  one  (monovalent)  or  two  (bivalent)  of  seven  different  C  botulinum  neurotoxins 
(BoNTs,  A-G).  The  four  species  have  been  classified  as  C.  botulinum  Groups  l-IV.  The  presence  of  bont 
genes  in  strains  representing  the  different  Groups  is  probably  the  result  of  horizontal  transfer  of  the  toxin 
operons  between  the  species. 

Results:  Chromosome  and  plasmid  sequences  of  several  C  botulinum  strains  representing  A,  B,  E  and  F 
serotypes  and  a  C.  butyricum  type  E  strain  were  compared  to  examine  their  genomic  organization,  or 
synteny,  and  the  location  of  the  botulinum  toxin  complex  genes.  These  comparisons  identified  synteny 
among  proteolytic  (Group  I)  strains  or  nonproteolytic  (Group  II)  strains  but  not  between  the  two  Groups. 
The  bont  complex  genes  within  the  strains  examined  were  not  randomly  located  but  found  within  three 
regions  of  the  chromosome  or  in  two  specific  sites  within  plasmids.  A  comparison  of  sequences  from  a  Bf 
strain  revealed  homology  to  the  plasmid  pCLJ  with  similar  locations  for  the  bont/bv  b  genes  but  with  the 
bontla4  gene  replaced  by  the  bontif  gene.  An  analysis  of  the  toxin  cluster  genes  showed  that  many 
recombination  events  have  occurred,  including  several  events  within  the  ntnh  gene.  One  such 
recombination  event  resulted  in  the  integration  of  the  bontlal  gene  into  the  serotype  toxin  B  ha  cluster, 
resulting  in  a  successful  lineage  commonly  associated  with  food  borne  botulism  outbreaks.  In  C  botulinum 
type  E  and  C  butyricum  type  E  strains  the  location  of  the  bont/e  gene  cluster  appears  to  be  the  result  of 
insertion  events  that  split  a  rorA,  recombination-associated  gene,  independently  at  the  same  location  in 
both  species. 

Conclusion:  The  analysis  of  the  genomic  sequences  representing  different  strains  reveals  the  presence  of 
insertion  sequence  (IS)  elements  and  other  transposon-associated  proteins  such  as  recombinases  that 
could  facilitate  the  horizontal  transfer  of  the  bonts;  these  events,  in  addition  to  recombination  among  the 
toxin  complex  genes,  have  led  to  the  lineages  observed  today  within  the  neurotoxin-producing  clostridia. 
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Background 

Clostridium  hotulinum  is  a  taxonomic  designation  for  at 
least  four  diverse  groups  of  Gram  positive  spore-forming 
anaerobic  bacteria  that  produce  the  most  potent  naturally 
occurring  toxin  known,  botulinum  neurotoxin  (BoNT). 
Production  of  BoNT  has  been  the  single  criterion  for 
inclusion  within  the  C.  hotulinum  species  and  was  adopted 
in  order  to  prevent  scientific  and  medical  confusion 
regarding  the  intoxication  known  as  botulism.  However, 
this  single  criterion  has  resulted  in  a  species  designation 
that  encompasses  clades  of  strains  that  should  be  consid¬ 
ered  as  four  separate  species.  Phylogenetic  analysis  of  16S 
rrn  genes  of  C.  hotulinum  strains  clearly  separates  them 
into  four  Groups  (1-lV)  and  supports  this  historical  classi¬ 
fication  scheme  based  upon  biochemical  and  biophysical 
parameters  [1].  Group  1  contains  proteolytic  serotype  A,  B 
and  F  strains,  as  well  as  bivalent  (bv)  Ab,  Ba,  Af,  and  Bf 
strains;  Group  11  consists  of  nonproteolytic  (np)  and  sac- 
charolytic  serotype  B,  E  and  F  strains;  Group  111  consists  of 
serotype  G  and  D  strains;  and  Group  IV  consists  solely  of 
serotype  G  strains  [2].  Group  IV  has  been  recognized  as  a 
distinct  species  and  its  members  have  been  given  the  addi¬ 
tional  name  of  C.  argentinense  [3].  Further  Group  designa¬ 
tions  (V  and  VI)  have  been  proposed  for  other  clostridial 
species  found  to  express  BoNT,  such  as  the  BoNT/F-pro- 
ducing  C.  haratii  strains  and  the  BoNT/E-producing  C. 
hutyricum  strains  [4]. 

Figure  1  and  previously  published  16S  rrn  dendrograms 
show  the  relationship  of  the  liont-containing  strains  to 
each  other  and  to  other  clostridial  species  [5,6].  Group  1 
shares  a  recent  common  ancestor  with  nontoxic  C.  sporo- 
genes.  Group  11  is  a  subset  of  a  more  diverse  clade  that 
includes  other  saccharolytic  clostridia,  such  as  C.  aceto- 
hutylicum,  C.  heijerinckii,  and  toxic  and  nontoxic  Group  V 
C.  haratii  and  Group  VI  C.  hutyricum.  Group  111  strains  pro¬ 
duce  BoNT/G,  D  and  mosaic  G/D  and  D/G  toxins  which 
share  a  recent  common  ancestor  with  nontoxic  C.  novyi. 
Group  IV,  producing  BoNT/G,  shares  a  clade  with  C.  suh- 
terminale  and  C.  proteolyticus.  Recent  microarray  analyses 
of  Group  1  strains  confirm  the  close  relationship  of  the 
strains  with  C.  sporogenes  and  the  disparity  in  gene  content 
between  Groups  1  and  11  strains  [7]. 

The  1 6S  rrn  dendrogram  also  shows  that  the  tetanus  toxin- 
producing  Glostridia,  C.  tetani,  occupies  a  distinct  clade 
when  compared  to  the  other  clostridial  species.  This  spe¬ 
cies  was  one  of  the  first  clostridial  genomes  to  be 
sequenced  revealing  the  presence  of  the  tetanus  toxin 
within  a  74  kb  plasmid  [8].  Recent  genomic  sequences  of 
different  C.  hotulinum  strains  have  revealed  single  or  biva¬ 
lent  honts  are  located  within  plasmids  as  often  as  within 
the  chromosome  [9-11].  Unlike  tetanus  toxin,  which 
appears  uniform  from  strain  to  strain,  hont  gene  sequence 
comparisons  have  identified  multiple  variants  that  are  rec¬ 
ognized  as  serotypes  and  subtypes. 


Gomparisons  of  the  BoNT/A-G  protein  sequences  in 
strains  representing  the  different  Groups  show  that  BoNT 
protein  identities  range  from  34%-64%  among  the  seven 
serotypes  [9].  In  addition,  the  variation  observed  in  BoNT 
protein  sequences  within  the  serotypes,  except  in  type  G, 
has  resulted  in  designations  of  BoNT  subtypes  within  a 
serotype  (for  example  subtypes  A1-A5  within  BoNT/A). 

The  discordant  phylogeny  of  the  serological  classification 
of  the  toxins  with  the  16S  rrn  analyses  and  Group  desig¬ 
nations  indicates  that  the  hont  genes  have  been  horizon¬ 
tally  transferred  between  various  clostridial  lineages. 
Horizontal  gene  transfer  events  are  observed  within  other 
bacterial  species  and  contribute  to  bacterial  evolution 
[12].  Although  the  exact  transfer  mechanisms  active 
within  the  clostridia  remain  unclear,  the  regions  flanking 
the  hont  and  toxin  complex  genes  include  partial  and  com¬ 
plete  insertion  sequence  (IS)  elements  and  gene  duplica¬ 
tion  events  indicative  of  mobile  element  activity.  In 
addition,  the  genes  of  several  honts  are  located  within  plas¬ 
mids  or  phage  [9-11].  These  findings  suggest  possible 
mechanisms  that  could  enable  the  horizontal  transfer  of 
hont  [13].  Recombination  events  within  the  hont  genes 
(mosaic  hontjcld  and  hontjaljaS  for  example)  and  within 
the  ntnh  gene  that  precedes  the  hont  gene  have  been 
observed  and  contribute  significantly  to  BoNT  diversity 
[5,6,13,14].  Although  the  three  plasmids  that  contain 
hontjaS,  hontla4,  hontjhv  h  or  hontjhl  genes  are  largely 
homologous,  each  shows  regions  of  inversions  and  dele¬ 
tions  [9]. 

Because  the  toxin  complex  genes  appear  to  move  among 
the  clostridia,  they  cannot  be  used  to  infer  the  phyloge¬ 
netic  relationships  of  the  host  bacteria.  However,  the 
sequences  and  the  locations  of  the  hont  gene  clusters  pro¬ 
vide  clues  to  earlier  gene  transfer  and  recombination 
events.  In  order  to  better  understand  these  events,  we 
compared  the  available  genomic  sequences  of  several 
strains  within  the  Group  1,  11  and  VI  designations.  Ghro- 
mosome  and  plasmid  synteny  were  analysed  and  the  spe¬ 
cific  locations  and  sequences  flanking  the  hont  complex 
genes  were  examined  within  C.  hotulinum  types  A,  B,  E  and 
F  strains  and  a  C.  hutyricum  type  E  strain.  Plasmid  loca¬ 
tions  for  the  hont/np  h  gene  within  the  Eklund  17  B  strain 
and  for  the  hont/hv  h  and  hontjf  genes  within  the  bivalent 
Bf  strain  were  identified.  A  detailed  examination  of  the 
toxin  complex  genes  and  their  flanking  regions  revealed 
recombination  and  insertion  events  that  have  contributed 
to  the  diversity  observed  today. 

Results 

Chromosomal  and  plasmid  synteny 

The  chromosomal  and  plasmid  sequences  from  strains 
representing  multiple  C.  hotulinum  serotypes  and  subtypes 
of  A,  B,  E  and  F,  two  bivalent  strains  (BoNT/Ba4,  BoNT/ 
Bf),  a  BoNT/E-expressing  C.  hutyricum,  a  C.  tetani  and  a  C. 
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Figure  I 

A  1 6S  rrn  dendrogram  of  clostridial  species.  The  1 6S  rrn  genes  from  the  1 5  strains  examined  in  this  study  (1 3  C.  botulinum 
indicated  in  red,  one  BoNT/E-producing  C  butyricum  in  red  and  one  C  sporogenes  in  blue)  were  aligned  to  the  1 6S  rrn  genes  of 
different  Clostridium  species  identified  within  Genbank  via  BLAST  searches.  A  maximum  likelihood  tree  using  78  sequences  with 
four  outgroup  sequences  from  the  Alkaliphilus  genus  (removed)  was  generated  from  1 ,208  nucleotides.  The  scale  bar  of  0.03 
represents  three  point  mutations  per  100  bases  or  3%  diversity  between  sequences.  Two  I6S  rrn  gene  sequences  from  C.  spo¬ 
rogenes  ATCC  1 5579  are  included.  The  1 6S  rrn  dendrogram  illustrates  the  genetic  diversity  within  the  Clostridium  genus  and 
among  strains  within  the  Group  l-VI  designations. 
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sporogenes  strain  (Table  1)  were  compared  in  order  to 
investigate  their  overall  organization  or  synteny.  Compar¬ 
isons  of  the  completed  chromosomal  sequences  of  the 
three  BoNT/Al  strains  (ATCC  3502,  ATCC  19397,  Hall) 
revealed  that  these  strains  are  nearly  identical  in  genomic 
organization  (data  not  shown).  The  history  of  the  three 
strains  is  not  clear,  however,  they  appear  to  be  different 
strains  isolated  from  foodborne  outbreaks  of  botulism 
[  15] .  The  serotype  A  Hall  strain  is  distinctive  in  that  it  pro¬ 
duces  a  high  concentration  of  toxin  in  culture  [16]. 
Unique  to  the  ATCC  3502  strain  is  the  presence  of  a  16  kb 
plasmid  [17].  Neither  this  intact  plasmid  nor  its  plasmid 
sequences  were  found  within  the  chromosomes  of  the 
other  two  BoNT/Al  strains. 

Figure  2  (panel  la)  compares  the  genomic  synteny  of  the 
Hall  BoNT/Al  strain  to  other  C.  hotulinum  Group  1  strains 
representing  serotypes  A,  B  and  F.  The  plot  shows  that  the 
chromosomes  of  strains  representing  four  BoNT/A  sub- 
types  (BoNT/Al-A4),  BoNT/Bl  or  BoNT/F  share  similar 
organization.  In  contrast,  there  is  little  chromosomal  syn¬ 
teny  between  the  Group  11  C.  hotulinum  serotype  E  strains 
and  the  Group  1  Hall  strain  or  the  C.  hutyricum  type  E 
strain  (Figure  2,  panel  lb,  Ic).  The  two  BoNT/E-producing 
C.  hotulinum  strains  (Alaska  E43  and  Beluga)  were  similar 
to  each  other  and  also  to  the  npBoNT/B  Eklund  17B  strain 
(data  not  shown).  These  comparisons  revealed  a  large 
(404  kb)  inversion  within  the  Eklund  17B  chromosome 
relative  to  the  C.  hotulinum  serotype  E  strains  that  is  not  in 


a  region  containing  the  hont/e  gene  cluster.  No  chromo¬ 
somal  synteny  was  observed  when  the  C.  hotulinum  Group 
1  and  Group  11  strain  sequences  were  compared  to  the  C. 
tetani  E88  strain  (data  not  shown).  A  comparison  of  the 
four  contigs  of  C.  sporogenes  ATCC  15579  to  the  Hall 
BoNT/Al  strain  (Figure  2,  panel  Id)  revealed  genomic 
synteny  and  a  large  701  kb  inversion  between  the  two  spe¬ 
cies.  The  four  panels  (la-d)  contrast  the  genomic  organi¬ 
zation  among  Group  1,  11  and  VI  strains  and  show  that 
Group  1  strains  share  a  similar  gross  chromosomal  organ¬ 
ization  to  each  other  and  to  C.  sporogenes,  which  differs 
from  Group  11  and  VI  strains. 

Plasmid  synteny  was  also  examined  by  comparing  the 
l7ont-containing  plasmids  (pCLK  with  hontjaS,  pCLJ  with 
hontla4  and  hont/hv  h,  pCLD  with  hont/hl )  from  Group  1  to 
each  other  and  to  the  Group  11  pCLL  with  hont/np  h  and 
pE88  in  C.  tetani.  These  plasmids  each  contain  genes 
encoding:  329  proteins  (pCLK);  195  proteins  (pCLD); 
305  proteins  (pCLJ);  54  proteins  (pCLL);  and  59  proteins 
(pE88).  Although  the  plasmids  containing  hontjaS,  hontj 
a4  and  hont/hl  vary  in  size  (148  kb  -  270  kb).  Figure  2 
panel  2a  shows  large  regions  of  conserved  organization 
among  these  plasmids  and  a  small  inversion  (16.7  kb) 
that  contains  the  hontj a3  relative  to  the  hontj a4. 

The  genomic  sequence  of  the  Group  11  B  strain,  Eklund 
17B,  revealed  the  location  of  the  hont/np  h  within  a  small 
(47.6  kb)  plasmid,  pCLL,  that  was  unique  when  com- 


Table  I:  List  of  analyzed  genomes. 


Species 

Subtype' 

Strain 

Group 

Locus  tag  ID2 

Genbank  accession^ 

Toxin  complex 

BoNT  location^ 

C  hotulinum 

Al 

ATCC  3502 

1 

CBO 

AM4I23I7/AM4I23I8 

HA-AI 

chr/oppA 

C  hotulinum 

Al 

ATCC  19397 

1 

CLB 

CP000726 

HA-AI 

chr/oppA 

C  hotulinum 

Al 

Hall 

1 

CLC 

CP000727 

HA-AI 

chr/opp 

C  hotulinum 

A  1(B) 

NCTC  2916 

1 

CBN 

ABD00200000I-49 

orfX-AI,  HA-(B) 

chr/arsC,  chr/oppA 

C  hotulinum 

A2 

Kyoto- F 

1 

CLM 

CP00I58I 

orfX-A2 

chr/arsC 

C  hotulinum 

A3 

Loch  Maree 

1 

CLK 

CP000962/CP000963 

orfX-A3 

plasmid 

C  hotulinum 

Ba4 

Strain  657 

1 

CLJ 

CPOO 1 083/CP00 1 08 1  /CPOO 1 082 

orfX-A4,  HA-bvB 

plasmid 

C  hotulinum 

Bl 

Okra 

1 

CLD 

CP000939/CP000940 

HA-BI 

plasmid 

C  hotulinum 

Bf 

1 

CBB 

ABDPO 100000 1-70 

HA-bvB,  orfX-F 

plasmid 

C  hotulinum 

prot  F 

Langeland 

1 

CLI 

CP000728/CP000729 

orfX-F 

chr/arsC 

C  hotulinum 

npB 

Eklund  I7B 

II 

CLL 

CPOO  I056/CP00 1057 

HA-npB 

plasmid^ 

C  hotulinum 

El 

Beluga 

II 

CLO 

ACSCOIOOOOOI-16 

orfX-EI 

chr/ rarA 

C  hotulinum 

E3 

Alaska  E43 

II 

CLH 

CPOO  1078 

orfX-E3 

chr/ rarA 

C  hutyricum 

E4 

BL  5262 

II 

CLP 

ACOMOIOOOOOI-13 

orfX-E4 

chr/ rarA 

C  tetani 

tetanus 

E88 

- 

CTC 

NC  004557/NC  004565 

p2l-NT 

Plasmid 

C  sporogenes 

N/A 

ATCC  15579 

1 

CLOSPO 

ABKW0200000I-4 

N/A 

N/A 

'Subtype  designations  as  listed  in  Hill  et  al,  2007 

^Locus  tag  ID  designations  listed  in  GenBank  and  Hill  et  al,  2007 

^Accessions  listed  as  chromosome/plasmid 

^orfX  =  orfXI,  orfX2,  orfX3;  HA  =  HAH,  HA33,  HA70  accessory  proteins 

^Location:  chr  =  chromosome,  oppA  =  oppA/brnQ  operon,  arsC  =  arsC  operon,  rarA  =  rarA  operon;  plasmid  =  plasmid 
^Plasmid  does  not  share  homology  with  Group  I  plasmids 
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Figure  2 

Chromosomal  and  plasmid  synteny  plots.  Panels  la-d  or  2  a-d  show  four  synteny  plots  of  either  chromosomal  or  plas¬ 
mid  sequence  alignments,  respectively.  The  reference  sequence  listed  on  the  x-axis  was  queried  with  the  strain  sequence  listed 
on  the  y-axis.  The  red  dots  indicate  forward  matches  of  the  sequence  comparisons:  the  blue  dots  indicate  reverse  compliment 
matches.  The  continuous  diagonal  line  in  the  plot  in  panel  la  illustrates  the  overall  chromosomal  organization  or  synteny 
shared  between  the  proteolytic  strains  of  Hall  and  either  the  Kyoto-F,  Loch  Maree,  657,  Okra  or  Langeland  strains.  Panel  I  b 
and  Ic  plots  compare  Hall  and  C.  butyricum  BL  5262  to  the  BoNT/E-producing  Alaska  E43  strain,  where  little  synteny  is 
observed.  In  panel  I  d  four  contigs  of  C.  sporogenes  ATCC  1 5579  are  compared  to  the  Hall  strain  and  reveal  genomic  synteny 
and  a  701  kb  inversion  between  the  two  species.  Panels  2a-d  examine  plasmid  synteny.  The  diagonal  lines  in  panel  2a  illustrate 
that  the  Loch  Maree  pCLK  has  a  similar  organization  to  pCLJ  with  a  small  1 6.7  kb  inversion  that  includes  the  bontlaS  relative  to 
the  bontla4.  Panels  2b  and  2c  show  that  pCLL  within  Ekiund  1 7B  does  not  share  synteny  either  to  pCLK  or  pE88  that  contains 
the  tetanus  toxin.  In  panel  2d  four  contigs  of  the  Bf  strain  show  synteny  to  pCLJ  and  the  1 6.7  kb  inversion  of  bontla4  relative  to 
the  bont/f 


pared  to  other  ?7ont-containing  plasmids.  Synteny  plots 
show  that  pCLL  differs  from  pCLK  (Figure  2  panel  2b)  and 
pE88,  the  plasmid  within  C.  tetani  that  contains  tetanus 
toxin  (Figure  2  panel  2c).  None  of  the  C.  hotulinum  plas¬ 
mids  (pCLK,  pCLJ  or  pCLD)  shared  synteny  to  C.  tetani 
pE88  (data  not  shown). 

Although  the  sequence  data  for  the  Bf  strain  is  incom¬ 
plete,  four  Bf  contigs  share  synteny  to  the  bivalent  pCLJ 
that  contains  hontla4  and  hontihv  h  (Figure  2  panel  2d). 
The  same  inversion  (16.7  kb)  identified  in  panel  2a  is 
observed  when  the  contigs  are  compared  to  pCLJ.  The  evi¬ 
dence  for  the  plasmid  location  of  hontihv  h  and  hont/f  is 
supported  by  the  sequence  homology  of  the  four  contigs 
to  pCLJ  and  a  detailed  examination  of  the  location  of  the 
hontjhv  h  and  hont/f  is  described  later. 

These  results  show  that  the  Group  1  C.  hotulinum  A,  B  and 
F  strains  share  a  similar  chromosome  organization  to  each 


other  and  to  C.  sporogenes  but  not  to  the  Group  11  nonpro- 
teolytic  B  strain  or  serotype  E  strains,  the  Group  VI  BoNT/ 
E-producing  C.  hutyricum  or  C.  tetani.  The  plasmids  con¬ 
taining  hont/aS,  hont/a4,  hont/hv  h,  hont/hl  or  hont/f  gene 
clusters  also  show  similarity  to  each  other  but  not  to  the 
C.  tetani  pE88  or  pCLL  containing  hont/np  h.  Comparisons 
between  the  Group  II  C.  hotulinum  BoNT/E  or  npBoNT/B- 
producing  strains  revealed  that  their  chromosomal  back¬ 
grounds  share  synteny  with  each  other  but  not  with  the 
Group  VI  C.  hutyricum  type  E  strain.  These  relationships 
confirm  the  different  genomic  backgrounds  within  C.  hot¬ 
ulinum  and  C.  tetani  and  support  the  16S  rrn  analyses  and 
historical  C.  hotulinum  Group  designations. 

Components  of  the  BoNT  gene  dusters 

The  arrangement  and  composition  of  the  toxin  gene  clus¬ 
ters  in  strains  representing  the  different  serotypes  and  sub- 
types  of  C.  hotulinum  and  BoNT/E-producing  C.  hutyricum 
are  shown  in  Figure  3.  A  comparison  of  these  regions 
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shows,  in  general,  that  the  BoNT  gene  is  located  in  either 
of  two  conserved  toxin  gene  cluster  arrangements,  com¬ 
posed  of  either  the  ha70-hal  7-ha33-hotR-ntnh-hont  com¬ 
plex  genes  (abbreviated  ha  cluster)  or  the  orfX3-orfX2- 
orpCl-(botR)-  p47 -ntnh-hont  complex  genes  (abbreviated 
orpe  cluster).  The  characteristics  of  the  different  proteins 
and  their  arrangements  have  been  previously  reported  for 
strains  representing  the  different  serotypes  [5,6] .  The  toxin 
complex  proteins,  with  the  exception  of  the  regulatory 
protein  BotR  (P21),  are  thought  to  provide  a  protective 
role  for  the  BoNT  in  the  gastrointestinal  tract  [18].  There 
is  evidence  that  the  hemaagglutinin  (HA)  proteins  may 
also  help  facilitate  the  absorption  of  BoNT  from  the  intes¬ 
tines  into  the  bloodstream  [19].  While  all  of  the  genes 
within  the  ha  cluster  express  proteins  that  are  part  of  the 
toxin  complex,  the  expression  and  function  of  the  orfX 
proteins  within  the  orfX  cluster  remain  unknown.  The 


presence  of  genes  that  encode  the  complex  proteins  in 
each  of  the  different  serotypes  suggests  that  these  proteins 
must  play  a  role  in  expression,  stability  and/or  transport 
of  the  BoNT. 

Figure  3  shows  that  the  ha  gene  cluster  is  found  within 
serotype  A  subtype  BoNT/Al  strains  and  all  of  the  sero¬ 
type  B  strains,  including  the  gene  cluster  harboring  the 
silent  hont/(h)  gene  within  BoNT/Al(B)  strains.  The  orfX 
gene  cluster  is  found  within  all  of  the  other  strains  exam¬ 
ined  here,  including  BoNT/A2,  BoNT/A3,  BoNT/A4 
strains  and  the  hontjal  gene  cluster  within  the  BoNT/ 
A1  (B)  strain.  It  is  also  found  within  the  proteolytic  BoNT/ 
F  Langeland  strain,  the  hontif  gene  cluster  in  the  bivalent 
Bf  strain,  the  BoNT/El  and  BoNT/E3  strains  and  the 
BoNT/E-producing  C.  hutyricum  strain. 
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Figure  3 

BoNT  complex  and  flanking  regions  in  different  strains.  The  boat  gene  cluster,  flanking  regions  and  location  (chromo¬ 
some  or  plasmid)  are  indicated  for  the  different  strains.  The  orfX  cluster  {orfX3-orfX2-orfX I -(botR)-p47-ntnh-bont  complex 
genes)  is  present  in  the  BoNT/E-producing  strains  (C.  botulinum  and  C.  butyricum),  the  BoNT/AI  of  the  Al  (B)  strain,  serotype  F 
(BoNT/F  and  BoNT/bvF)  and  the  BoNT/A2-A4  subtypes.  The  ha  cluster  (haJO-ha 1 7-ha33-botR-ntnh-bont  complex  genes)  is 
present  in  the  serotype  B  strains  containing  BoNT/bvB,  BoNT/BI,  npBoNT/B,  BoNT/(B)  and  BoNT/AI  of  the  Hall  strain.  The 
flanking  regions  consist  of  IS  elements,  flagellin  (fla),  lycA  and  hypothetical  (hypo)  proteins.  The  prime  symbol  indicates  a  partial 
gene. 
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The  hontjal  gene  appears  to  be  the  only  hont  so  far  identi¬ 
fied  within  either  of  the  two  types  of  toxin  complexes.  The 
bontlal  gene  in  strains  ATCC  3502,  ATCC  19397  and  Hall 
is  located  within  the  ha  cluster  and  the  hontlal  within  the 
BoNT/Al(B)  strain,  as  well  as  several  other  BoNT/Al 
strains,  is  located  within  the  orfX  cluster  [20].  It  appears 
that  the  location  of  the  hontlal  gene  within  the  ha  cluster 
resulted  from  a  recombination  event  in  the  middle  of  the 
serotype  B  ntnh  gene  that  has  been  previously  reported 
[21].  The  first  half  of  the  ntnh  gene  in  the  BoNT/Al  strain 
is  99.7%  identical  to  the  ntnh  within  serotype  B  strains. 
After  a  recombination  event  occurring  at  approximately 
1,965  nucleotides  from  the  start  codon  of  the  3,594  bp 
gene,  the  second  half  of  the  ntnh  gene  is  equally  similar  to 
the  ntnh  gene  in  serotypes  A2,  A3  and  A4  (90  to  95%  iden¬ 
tity)  strains.  This  event  has  resulted  in  a  hontjal  gene  resid¬ 
ing  within  an  ha  cluster  that  contains  a  hybrid  or 
recombinant  B/A  ntnh  gene. 

The  ntnh  recombination  event  locating  the  hontjal  gene 
within  the  ha  cluster  has  resulted  in  a  very  successful  line¬ 
age  that  is  frequently  identified  in  botulism  cases.  The 
many  strains  representing  this  event,  such  as  ATCC  3502, 
ATCC  19397  and  Hall,  contribute  to  the  acceptance  of  the 
ha  cluster  in  association  with  the  hontjal  gene.  However, 
the  orfX  cluster  is  more  likely  to  be  the  ancestral  toxin  gene 
cluster  containing  the  hontjal  gene,  as  indicated  by  the 
location  of  the  other  hontja  subtype  genes  {hontlal  hontjaS 
and  hontla4)  and  the  hontjal  gene  of  the  silent  B  strains 
within  the  orfX  cluster.  In  addition,  the  hontjal  genes 
within  the  ha  cluster  are  located  in  a  different  region  of  the 
chromosome  from  the  hontjal  genes  in  the  orfX  cluster,  as 
described  below. 

Location  of  the  BoNTs  within  the  chromosome 

Because  the  strains  within  each  C.  botulinum  Group 
showed  genomic  synteny  when  compared  to  each  other, 
the  chromosomal  or  plasmid  location  of  each  hont  gene 
was  examined  to  determine  if  the  regions  containing  the 
different  hont  genes  had  similar  features.  This  analysis 
revealed  that  the  hont  genes  in  these  strains  are  not  ran¬ 
domly  distributed  but  rather  are  found  within  three  spe¬ 
cific  sites  within  the  chromosome:  (1)  the  arsC  operon 
that  contains  either  the  hontjal,  hontjf  or  the  orfX-hontjal 
of  the  silent  BoNT/A(B)  strains;  (2)  the  oppAjhrnQ  operon 
that  contains  either  the  ntn/i-recombinant  {ha)  hontjal  or 
hontl(h);  and  (3)  the  rarA  operon  which  contains  the  hontj 
e  within  the  C.  botulinum  and  C.  hutyricum  type  E  strains. 
Figure  4  shows  the  location  of  these  sites  in  relation  to  the 
ATCC  3502  or  Beluga  chromosome:  the  arsC  operon  at 
approximately  847  kb,  the  oppA/hrnQ  operon  at  approxi¬ 
mately  895  kb  and  the  rarA  operon  at  approximately 
2,704  kb. 

The  arsC  gene  is  part  of  a  group  of  genes  {arsA,  arsB,  arsC, 
arsD,  and  arsR)  that  encodes  for  proteins  involved  in 


arsenic  reduction.  BoNT/Al,  BoNT/Al(B),  BoNT/A2,  and 
BoNT/F  strains  contain  all  five  genes,  but  BoNT/A3, 
BoNT/Ba4  and  BoNT/Bl  strains  lack  genes  for  arsA,  arsB 
and  arsD.  Recently,  it  has  been  shown  that  certain  BoNT/ 
B2  strains  lacking  the  full  gene  complement  are  sensitive 
to  arsenic,  while  BoNT/B2  strains  containing  all  five  genes 
are  relatively  resistant  to  arsenic  [22]. 

An  expanded  view  of  the  arsC  operon  in  Figure  5  shows 
the  different  constituents  within  this  location  in  the  differ¬ 
ent  strains.  Within  this  approximately  20  kb  region  three 
hont  genes  can  be  found:  the  orfX-hontjal  of  BoNT/A(B) 
strains;  the  proteolytic  hontjf;  and  the  hontjal.  A  striking 
similarity  is  seen  between  the  region  surrounding  the 
hontjal  cluster  and  that  surrounding  the  hontjf  cluster. 
These  two  different  serotypes  contain  many  of  the  same 
genes  in  the  same  order  in  this  location.  The  hontjal  gene 
cluster  is  also  located  here,  but  this  region  is  not  as  similar 
to  the  region  within  the  BoNT/Al  or  BoNT/F  strains  as 
they  are  to  each  other.  As  has  previously  been  reported, 
the  hontjal  is  located  in  between  two  copies  of  the  arsC 
[23].  Other  strains,  such  as  those  containing  hontj a3,  hontj 
a4,  hontjhv  h  or  hontjhl  genes,  have  no  hont  genes  within 
this  region. 

Since  some  of  these  strains  contain  multiple  arsC  genes,  a 
dendrogram  of  the  various  copies  was  created  to  compare 
genes  within  and  among  the  strains  (Figure  6).  The  arsC 
dendrogram  shows  that  the  sequences  of  the  arsC  genes 
are  not  identical  within  a  strain  or  between  the  strains.  It 
also  shows  that  the  three  copies  within  the  BoNT/A2 
strain  differ  from  each  other,  as  do  the  two  copies  found 
within  BoNT/Al(B)  and  BoNT/F  strain.  The  single  arsC 
gene  within  C.  sporogenes  is  more  closely  related  to  one  of 
the  copies  within  the  Group  1  strains.  The  copy  within  the 
Eklund  17B  and  Alaska  E43  strains  are  nearly  identical  but 
differ  from  the  arsC  within  C.  hutyricum. 

About  25  kb  downstream  from  the  arsC  operon  in  the 
ATCC  3502  strain  is  the  oppA/hrnQ  operon  where  the  hontj 
(h)  gene,  or  the  ha  cluster  BoNT/A  strains,  are  located  (Fig¬ 
ure  7).  This  site  is  named  for  the  oppA,  extracellular  solute 
binding  protein,  and  hrnQ,  branched  chain  amino  acid 
transport  protein,  located  here.  This  is  the  only  site  where 
a  hontj (h)  gene,  although  silent  due  to  a  mutation,  was 
identified  within  the  chromosome;  the  hontjhl  and  hontj 
hv  h  genes  in  strains  analyzed  as  part  of  this  study  were 
located  within  plasmids.  This  site  does  not  contain  the 
hont  genes  in  the  BoNT/A2,  BoNT/A3,  BoNT/A4  or  BoNT/ 
B1 -producing  strains.  The  oppA/hrnQ  operon  was  not 
present  within  the  serotype  E  strains,  the  BoNT/E-produc- 
ing  C.  hutyricum,  or  the  npBoNT/B  strain. 

At  approximately  2704  kb  within  the  ATCC  3502  chromo¬ 
some  (1.102  Mb  in  Eklund  17B)  is  the  location  of  the  rarA 
operon.  No  hont  genes  are  located  here  in  the  Group  1  pro- 
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Figure  4 

Relative  locations  of  the  different  bonts  within  the  chromosome  or  plasmid.  Three  operons  (designated  arsC,  oppA/ 
brnQ  and  rarA)  within  the  chromosome  of  the  ATCC  3502  or  Beluga  strain  show  where  the  various  bonts  are  located  within 
the  different  strains.  The  bontlal  of  the  A  1(B)  strain,  bontlal  of  the  Kyoto-F  and  the  bont/f  of  the  Langeland  strain  are  located 
within  the  arsC  operon.  The  bont/(b)  within  the  Al  (B)  strain  and  the  bont/a  I  within  the  Hall  strain  is  located  within  the  oppA/ 
brnQ  operon.  The  rarA  operon  contains  the  bont/e  complex  within  the  Beluga,  Alaska  E43  or  C.  butyricum  BL  5262  strains.  The 
relative  locations  of  the  bonts  in  the  Group  I  plasmids  are  indicated  in  pCLJ.  One  site  contains  either  the  bont/aS  in  the  Loch 
Maree  strain,  the  bont/o4  in  the  bivalent  657  or  the  bont/f  \n  the  Bf  strain.  Another  site  contains  either  the  bont/b  I  in  the  Okra 
strain,  the  bont/bv  b  in  657  or  bont/bv  b  in  the  Bf  strain.  The  bont/np  b  location  within  pCLL  within  the  Ekiund  1 7B  strain  is  indi¬ 
cated.  This  figure  shows  the  common  sites  of  the  bonts  in  different  strains  providing  evidence  that  the  bonts  are  not  randomly 
located  within  the  chromosome  or  plasmid. 


teolytic  strains.  However,  in  the  BoNT/E-producing  C.  bot- 
ulinum  (Beluga  and  Alaska  E43)  and  C.  butyricum  (BL 
5262)  strains,  the  rarA  gene  is  split  and  the  bont/e  gene 
cluster  and  other  genes  are  inserted.  Figure  8(a)  shows  the 
similarity  of  the  rarA  region  in  the  npBoNT/B  strain  and 
the  two  BoNT/E-producing  species  (C.  botulinum  and  C. 
butyricum)  and  also  the  gene  organization  of  the  inserted 
sequence.  Although  these  regions  appear  similar,  the  bonts 
in  the  strains  are  in  different  locations  -  the  bont/np  b  is 
located  within  a  small  plasmid  whereas  the  bont/e  genes 
are  located  within  the  chromosome. 

The  gene  sequence  of  the  split  rarA  in  the  serotype  E 
strains  can  be  spliced  together  to  encode  an  intact  fully 
functional  protein.  The  location  of  the  split  (codon  102) 
is  in  the  same  site  in  both  the  C.  botulinum  and  C.  butyri¬ 


cum  strains.  Interestingly,  the  inserted  sequences  not  only 
contain  the  bont/e  gene  cluster,  but  also  contain  another 
rarA  gene  that  is  intact.  Therefore,  these  strains  retain  an 
intact  copy  of  rarA  in  addition  to  the  one  that  is  split. 

Figure  8(b)  compares  the  nucleotide  sequences  of  the 
spliced  and  intact  rarA  gene  in  these  strains  and  other  spe¬ 
cies.  The  dendrogram  shows  that  the  intact  (inserted)  rarA 
are  almost  identical  to  each  other  in  the  BoNT/E-produc- 
ing  C.  botulinum  and  C.  butyricum  strains,  suggesting  a 
common  source.  However,  the  sequences  of  the  spliced 
rarA  genes  within  these  C.  botulinum  and  C.  butyricum 
strains  are  not  identical.  The  spliced  rarA  within  the  Bel¬ 
uga  and  Alaska  E43  strains  are  almost  identical  to  each 
other  and  very  similar  to  the  Ekiund  17B  strain  rarA 
sequence.  The  different  sequences  of  the  rarA  genes  that 
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Figure  5 

Comparison  of  the  arsC  operon  in  different  strains.  The  region  of  the  arsC  operon  within  the  ATCC  3502  strain  was 
compared  to  the  arsC  region  in  other  strains.  The  horizontal  arrows  indicate  coding  sequences  (CDSs).  Gene  designations  are 
labelled  above  the  arrow.  GenBank  locus  IDs  are  labelled  below  the  arrow.  The  first  CDS  was  given  the  full  GenBank  locus  ID 
followed  by  an  abbreviated  ID  that  uses  only  the  last  2-3  digits.  At  this  site  (847  kb  -  868  kb)  there  is  no  toxin  gene  cluster 
within  ATCC  3502;  however,  this  site  contains  the  bontlal  of  the  AI(B)  strain  and  the  bont/f  within  the  Langeland  strain.  The 
components  in  this  region  are  depicted  in  the  Kyoto-F,  Loch  Maree,  657  and  Okra  strains.  The  regions  flanking  the  arsC 
operon  are  similar  upstream  and  downstream  in  each  of  these  strains. 


are  split  by  the  bont/e  insertion  in  C.  botulinum  and  C. 
butyricum  show  that  these  were  separate  events  occurring 
in  different  bacterial  backgrounds. 

The  mechanism  of  the  insertion  event  likely  involves  the 
rarA  protein,  which  is  a  resolvase  involved  in  recombina¬ 
tion  or  insertion  events  of  transposons.  Transposon  activ¬ 
ities  within  Gram  positive  bacteria  are  not  well 
characterized  but  are  known  to  be  responsible  for  genetic 
exchange  of  antibiotic  resistance  genes  and/or  genomic 
islands  in  other  bacteria  such  as  Staphylococcus  aureus 
methicillin  resistance,  for  example  [24].  The  rarA  inser¬ 
tion  site  was  likely  targeted  by  the  presence  of  a  rarA  gene 
within  the  inserted  region.  The  presence  of  an  IS  element 
and  a  transposon  resolvase  involved  in  horizontal  gene 
transfer  suggests  that  either  or  both  could  have  played  a 
role  in  the  insertion  of  the  bont/e  gene  cluster  into  the 
chromosome. 

Location  of  the  BoNTs  within  plasmids 

The  plasmid  location  of  the  bont/aS,  bont/a4,  bont/bv  b  and 
bont/bl  genes  from  the  analysed  strains  has  been  previously 
described  [9,10].  The  bont/np  b  gene  was  recently  identified 
by  pulsed  field  gel  electrophoresis  to  be  located  within  a 
small  plasmid  [11].  The  genomic  sequence  data  for  the 


Eklund  17B  strain  verified  the  presence  of  bont/np  b  within  a 
unique  47.6  kb  plasmid.  In  addition,  the  location  of  the  bont/ 
bv  b  and  bont/f  within  a  plasmid  (pBf)  in  the  Bf  strain  was 
identified  based  upon  synteny  results  and  the  high  sequence 
homology  of  four  Bf  strain  contigs  (ABDPO 10000 18.1, 
ABDP01000023.1,  ABDPO  1000034.1  and  ABDPOIOOOO 
69.1)  with  pCLJ,  pCLD  and  pCLK.  The  comparisons  of  pCLJ 
to  the  Bf  contig  sequences  yielded  the  following  results:  99% 
identity,  89%  coverage  to  contig  ABDPO  1000023.1  (68.4 
kb)  that  contains  bont/f  99%  identity,  81%  coverage  to  con¬ 
tig  ABDPO  10000 18.1  (84.3  kb)  that  contains  bont/bv  b;  96% 
identity,  52%  coverage  to  contig  ABDPO  1000034.1  (16.8 
kb);  and  98%  identity,  65%  coverage  to  contig 
ABDP01000069.1  (0.8  kb). 

These  results  are  depicted  in  Figure  9  where  the  sequences 
of  the  four  plasmids,  bivalent  pCLJ,  pCLK,  pCLD  and  the 
four  pBf  contigs,  are  compared.  Regions  of  homology 
among  these  plasmids  are  indicated  in  red  and  the  toxin 
regions  of  bont/a3,  bont/a4,  bont/bl,  bont/bv  b  and  bont/f  dvre 
indicated  in  yellow  or  blue.  The  figure  cannot  accurately 
depict  pBf  because  the  sequence  data  is  incomplete  (170 
kb),  however,  it  does  appear  that  the  bont/f  and  bont/bv  b 
are  located  within  a  plasmid  that  is  very  similar  to  the 
bivalent  pCLJ.  It  is  interesting  to  note  the  similar  locations 
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Figure  6 

Dendrogram  of  arsC  gene.  The  392  nucleotides  of  arsC 
(arenate  reductase)  were  compared  among  C  botulinum 
strains  and  other  clostridial  species.  Where  multiple  copies 
of  the  arsC  were  present  within  a  strain,  the  copies  are  desig¬ 
nated  as  C-l,  C-2  or  C-3  based  upon  their  location  within 
the  operon  shown  in  Figure  5.  The  dendrogram  illustrates 
that  the  arsC  copies  within  the  same  strain  are  different  from 
each  other  and  that  the  arsC  sequences  from  Groups  I,  II  and 
VI  strains  differ. 


of  the  bonts  in  the  two  plasmids,  where  it  appears  the  bont/ 
a4  is  replaced  with  bont/f. 

An  examination  of  the  bont  locations  within  the  plasmids 
shows  that,  as  with  the  chromosome,  there  are  specific 
sites  within  the  plasmid  where  the  bonts  are  located  (Fig¬ 
ure  4,  10).  The  two  plasmid  sites  contain  either:  (1)  the 
bontjaS  gene  cluster,  the  bontjf  gene  cluster  from  the  Bf 
strain  or  the  bontla4  gene  cluster  from  the  657  strain;  or 
(2)  the  bontjbl  gene  cluster,  the  bontjbv  b  gene  cluster  from 
the  657  strain  or  the  bontjbv  b  gene  cluster  from  the  Bf 
strain.  Interestingly,  the  location  oibontjal  or  bont/f  genes 


at  the  same  site  within  the  plasmid  was  also  observed 
within  the  chromosome. 

The  second  plasmid  site  within  the  Group  I  strains  con¬ 
tains  either  the  bont/bv  b  or  the  bont/bl  gene.  However,  the 
bont/np  b  is  located  within  a  very  different  plasmid  and 
host  background  from  the  proteolytic  strains.  Examina¬ 
tion  of  the  regions  flanking  the  bont/np  b  reveals  that 
downstream  is  an  IS  element,  a  transposon-associated 
resolvase  and  site-specific  recombinase.  Like  bont/e,  the 
bont/np  b  is  another  example  where  a  bont  is  in  proximity 
to  a  transposon-associated  protein  involved  in  recombi¬ 
nation  and  insertion  events  within  a  Group  II  back¬ 
ground. 

Recombination  within  the  ntnh  gene 

The  ntnh  gene  has  been  consistently  located  within  the 
toxin  complexes  in  strains  representing  each  of  the  seven 
serotypes  (A-G)  and  has  been  identified  as  a  region  of 
recombination  among  strains  of  different  serotypes  [21]. 
The  ntnh  dendrogram  (Figure  11)  illustrates  the  variation 
observed  among  the  different  serotypes.  The  ntnh  within 
the  A2-A4  subtypes  {orfX  cluster)  is  very  different  from  the 
A1  subtypes  {ha  cluster)  represented  by  the  ATGG  3502  or 
the  A1(B)  strains.  A  recombination  event  has  occurred 
approximately  midway  within  the  ntnh  gene  between  a 
serotype  B  ntnh  and  a  serotype  A  ntnh  resulting  in  a  hybrid 
or  recombinant  B/A  ntnh;  this  recombination  event  has 
placed  the  bont/al  within  the  ha  cluster  usually  associated 
with  bont/b. 

Another  recombination  event  was  observed  in  the  BoNT/ 
A2-producing  7I03-H  strain  associated  with  an  infant  bot¬ 
ulism  case  in  Japan,  evident  from  its  location  within  the 
dendrogram.  The  first  2000  bases  in  this  recombinant 
ntnh  are  almost  identical  with  a  BoNT/Gl  ntnh  (99.6% 
identity)  and  the  final  1582  bases  are  99.1%  identical  to 
the  ntnh  of  the  BoNT/A2  Kyoto-F  strain  (designated  G/A 
ntnh)  [25].  The  site  of  this  recombination  event  is  in  the 
same  region,  but  not  in  the  same  site,  as  the  hybrid  B/A 
ntnh  described  above. 

The  dendrogram  also  illustrates  that  the  ntnh  gene  of  the 
BoNT/A2-4  subtypes  and  the  serotype  F  Langeland  strain 
are  very  similar  to  each  other,  yet  their  bonts  differ.  A  com¬ 
parison  of  the  ntnh  genes  for  BoNT/A2  Kyoto-F  and  BoNT/ 
F  Langland  shows  them  to  be  97.0%  identical  for  the  first 
3,443  nucleotides,  but  the  identity  decreases  to  51.0%  in 
the  final  58  nucleotides.  This  finding  indicates  that  a  pos¬ 
sible  recombination  event  has  occurred  either  in  the  3'  ter¬ 
minus  of  the  ntnh  gene  and/or  in  the  intergenic  region 
between  the  ntnh  and  bont  genes.  The  occurrence  of  this 
recombination  event  is  also  supported  by  the  location  of 
the  serotype  F  Langeland  ntnh  gene  with  the  ntnh  genes  of 
BoNT/A2,/A3,  and/A4  strains  in  the  dendrogram,  and  not 
with  the  ntnh  genes  of  other  serotype  F  strains. 
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Figure  7 

Comparison  of  the  oppA/brnQ  operon  in  different  strains.  The  region  of  the  oppAlbrnQ  operon  within  the  ATCC  3502 
strain  was  compared  in  the  different  strains.  The  horizontal  arrows  indicate  coding  sequences  (CDSs).  Gene  designations  are 
labelled  above  the  arrow.  GenBank  locus  IDs  are  labelled  below  the  arrow.  The  first  CDS  was  given  the  full  GenBank  locus  ID 
followed  by  an  abbreviated  ID  that  uses  only  the  last  two  to  three  digits.  In  this  region  (895  -  91 5  kb)  the  bontlal  within  the 
ATCC  3502  strain  and  the  bontl(b)  of  the  Al  (B)  strain  are  located.  No  bont  genes  within  the  other  strains  of  Langeland,  Kyoto- 
F,  657,  Okra  and  Loch  Maree  are  located  here.  The  regions  flanking  the  oppA/brnQ  operon  are  similar  upstream  and  down¬ 
stream  in  each  of  these  strains. 


These  examples  show  the  ability  of  the  ntnh  gene  from  the 
toxin  complex  of  serotypes  A  and  C,  B  and  A  and  the  3'  ter¬ 
minus  or  the  intergenic  region  between  an  A  ntnh  and  the 
hont/f  genes  to  recombine;  such  recombination  events 
have  contributed  to  the  variation  observed.  These  events 
also  illustrate  the  proximity  of  bacteria  containing  these 
genes  to  each  other  within  an  anaerobic  environment  that 
allows  exchange  and  recombination. 

Discussion 

Comparisons  of  the  complete  and  shotgun  sequence  data 
from  strains  representing  the  Group  1  and  11  strains  of  C. 
hotulinum  and  a  C.  hutyricum  type  E  strain  were  performed 
in  order  to  further  understand  the  variation  observed 
among  the  BoNT-producing  clostridia  and  to  examine  the 
unusual  attributes  observed  within  the  species.  These 
include  the  presence  of  similar  honts  in  different  genomic 
backgrounds  {hontje  in  C.  hotulinum  and  C.  hutyricum  for 


example),  the  presence  of  different  honts  in  similar  back¬ 
grounds  (serotype  A  proteolytic  B  and  F  C.  hotulinum 
strains)  and  the  existence  of  bivalent  strains.  New  technol¬ 
ogies  have  made  genomic  sequencing  more  affordable 
and  rapidly  provide  a  wealth  of  sequence  information 
that  molecularly  describes  an  organism.  This  study  uti¬ 
lized  the  clostridial  genomic  sequence  data  and  generated 
comparisons  of:  the  1 6S  rrn  genes  from  various  clostridial 
species;  the  genomic  synteny  among  strains;  the  locations 
of  hont  toxin  clusters;  and  the  components  in  their  flank¬ 
ing  regions.  The  data  ties  previous  historical  research  with 
molecular  results  and  increases  our  understanding  of  the 
species. 

The  molecular  data  supports  the  historical  species  Group 
1-lV  classification  system  for  C.  hotulinum  based  upon  bio¬ 
chemical  and  physical  properties.  Gomparisons  of  the 
organization  of  the  genomic  sequences  in  synteny  plots 
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Figure  8 

(a)  Location  of  the  RarA  operon  within  C  botulinum  and  C  butyricum  strains  and  (b)  dendrogram  of  rarA  genes 
from  different  clostridial  species.  The  rarA  operon  in  the  ATCC  3502  strain  is  compared  to  the  rarA  operon  in  the  Group 
II  and  VI  strains.  In  the  Ekiund  1 7B  strain  the  rarA  gene  is  intact.  However,  in  the  Alaska  43,  Beluga  and  C.  butyricum  BL  5262 
strains,  the  rarA  gene  is  split  and  a  bont/e  gene  cluster  has  been  inserted.  Note  the  similarity  of  the  components  within  the 
inserted  sequence  and  that  it  also  contains  an  intact  rarA  gene.  The  regions  flanking  the  rarA  operon  are  similar  upstream  and 
downstream  in  the  Group  II  strains,  (b)  The  dendrogram  of  rarA  genes  shows  that  some  strains  contain  two  copies  of  rarA,  one 
that  is  intact  and  one  that  is  split  from  the  insertion  of  the  bont/e  complex  genes.  The  1 ,  1 95  nucleotides  of  rarA  from  both 
intact  and  split  genes  were  compared;  the  sequences  of  split  rarA  genes  were  spliced  together  to  make  full-length  genes.  The 
dendrogram  shows  that  the  sequences  of  the  spliced  rarA  in  C.  botulinum  Alaska  E43  and  Beluga  type  E  strains  are  similar  to 
each  other  but  are  different  from  the  spliced  rarA  in  C.  butyricum  BL  5262.  This  difference  indicates  that  the  insertion  of  the 
toxin  gene  cluster  occurred  as  two  separate  events  in  each  species.  The  inserted/intact  rarA  sequences  in  both  of  these  species 
are  similar  indicating  a  common  source. 
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Figure  9 

Plasmid  synteny  among  pCLK,  pCLJ,  pCLD  and  pBf.  Three  fully  sequenced  plasmids  (pCLK,  pCLJ  and  pCLD)  are  com¬ 
pared  to  four  contigs  of  the  Bf  strain  that  showed  identity  to  pCLJ  by  BLASTN  analysis.  Regions  of  homology  among  the  biva¬ 
lent  pCLJ,  pCLK,  pCLD  and  four  pBf  contigs  is  indicated  in  red  and  the  toxin  regions  containing  of  bontlaS,  bontla4  or  bontif  are 
coloured  in  blue  or  the  bontibv  b  and  bontibl  in  yellow.  The  comparisons  show  the  similar  location  of  the  bontibv  b  and  bont/ 
b/ among  the  3  plasmids.  The  bont/f  and  bontlaS  also  have  similar  locations  but  are  inverted  in  relation  to  bontla4.  The  four  Bf 
contigs  include  ABDPO 1 0000 1 8. 1  (84.3  kb),  ABDPO 1 000023. 1  (68.4  kb),  ABDPO 1 000034. 1  ( 1 6.8  kb),  ABDPO 1 000069. 1  (0.8 
kb)  and  were  ordered  according  to  pCLJ.  The  coloured  symbols  are  expanded  in  Figure  lO  to  detail  the  genes  located  in  these 
regions. 


presented  here  confirm  that  serotype  A,  B,  and  F  of  prote¬ 
olytic  Group  I  strains  share  a  similar  C.  sporogenes  genetic 
background.  Likewise,  the  genomic  organization  within 
the  Group  11  nonproteolytic  strains  that  express  B,  E  and  F 
toxins  share  similarity  to  each  other.  The  16S  rrn  dendro¬ 
gram  shows  that  the  different  Groups  1-lV  within  the  C. 
hotulinum  species  designation  are  clearly  as  distinct  as 
other  clades  of  clostridia  that  have  been  classified  or 
named  as  separate  species. 

The  location  of  the  hont  gene  in  these  strains  revealed  that 
the  sites  are  not  randomly  distributed  in  the  host 
genomes.  The  hont  and  associated  cluster  genes  are  located 
within  plasmids  of  varying  sizes  (47.6  -  270  kb)  as  well  as 
within  the  chromosome.  Franciosa  et  al  recently  exam¬ 
ined  the  location  of  the  toxin  cluster  in  63  BoNT/B-pro- 
ducing  C.  hotulinum  strains  using  pulsed  field  gel 


electrophoresis;  they  discovered  that  each  of  the  toxin 
gene  clusters  were  located  within  plasmids  ranging  in  size 
from  ~55  to  ~245  kb  [11]. 

The  presence  of  the  toxin  cluster  within  either  plasmids, 
or  within  the  chromosome  in  strains  of  the  same  or  differ¬ 
ent  serotypes,  is  consistent  with  horizontal  transfer  events 
mediated  by  plasmids  or  phage  and  recombination  events 
mediated  by  mobile  genetic  elements  such  as  trans- 
posons.  These  events  result  in  the  integration  of  the  hont 
genes  into  different  locations  (plasmids,  chromosome) 
and  different  host  backgrounds  (Group  1-Vl),  as  is 
observed  within  the  BoNT-producing  clostridia.  The 
detailed  examination  of  the  hont  locations  reveals  that 
these  events  occur  with  a  greater  frequency  by  homolo¬ 
gous  or  targeted  transposition  rather  than  random  or 
novel  integration  events. 
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Figure  lO 

Plasmid  regions  containing  bonts.  This  is  an  expanded  image  of  the  regions  between  the  symbols  in  Figure  9  and  provides 
details  of  the  genes  located  within  the  different  plasmids  in  these  areas.  The  horizontal  arrows  indicate  coding  sequences 
(CDSs).  Gene  designations  are  labelled  above  the  arrow.  GenBank  locus  IDs  are  labelled  below  the  arrow.  The  first  CDS  was 
given  the  full  GenBank  locus  ID  followed  by  an  abbreviated  ID  that  uses  only  the  last  two  to  three  digits.  The  figure  shows  that 
the  bonts  within  the  plasmids  in  these  Group  I  strains  are  located  in  either  of  two  sites.  One  location  (between  the  yellow  and 
blue  symbols)  contains  either  the  bontif  of  the  Bf  strain,  the  bontla4  of  the  657  strain  or  the  bontlaS  of  the  Loch  Maree  strain. 
The  numbers  in  parentheses,  such  as  23,742  bp  in  the  pBf  panel,  indicate  additional  sequence  in  that  region  that  is  not  detailed 
but  is  shown  in  Figure  9.  The  other  plasmid  site  that  contains  the  bont/b  in  several  plasmids  is  depicted  between  the  green  and 
purple  symbols.  This  region  contains  the  bontibv  b  or  bontibl  in  the  Bf  strain,  657  strain  or  the  Okra  strain.  The  bottom  panel 
depicts  the  region  containing  the  bontinp  b  within  the  Ekiund  1 7B  strain.  This  region  shares  no  similarity  to  the  Group  I  plas¬ 
mids. 
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Figure  I  I 

Dendrogram  of  the  ntnh  gene  in  different  BoNT-pro- 
ducing  strains.  The  dendrogram  of  3,471  nucleotides  of  the 
ntnh  gene  shows  the  variation  within  this  gene.  Some  of  the 
ntnh  sequence  variation  in  the  strains  is  due  to  recombina¬ 
tion  events.  The  location  of  the  ntnh  in  ATCC  3502,  ATCC 
1 9397  and  Hall  strain  close  to  the  ntnh  within  the  serotype  B 
strains  resulted  from  a  recombination  event  midway  within 
the  ntnh  gene  resulting  in  a  recombinant  B/A  ntnh,  that  is  a 
partial  B  ntnh  and  partial  A  ntnh.  Another  similar  recombina¬ 
tion  event  in  ntnh  of  the  7103-H  strain  has  resulted  in  a 
recombinant  C/A  ntnh,  that  is  a  partial  C  ntnh  and  a  partial  A 
ntnh.  The  Langeland  F  ntnh  location  in  the  dendrogram  near 
the  serotype  A  strains  of  Kyoto-F,  Loch  Maree  and  657 
resulted  from  a  recombination  event  near  the  3'  end  or  fol¬ 
lowing  the  ntnh  gene  where  a  bontif  was  inserted.  The  den¬ 
drogram  illustrates  the  variation  in  the  ntnh  genes  from 
multiple  serotypes  and  the  location  of  recombinant  ntnh 
genes. 


The  species  also  appears  to  undergo  active  recombination 
within  the  toxin  complex  genes,  particularly  at  multiple 
sites  within  the  ntnh  gene.  Examples  of  recombination 
include:  (1)  ntnh  -  the  hybrid  B/A  ntnh  placing  the  hontjal 
within  the  ha  cluster  of  serotype  B  and  the  hybrid  C/A  ntnh 
placing  the  hontjal  following  a  C/A  ntnh  hybrid;  (2)  hont  - 
the  hybrid  hontjal  gene  consisting  oi  hontjal  and  hontjaS; 
hontjejd  and  hontjdjc  hybrids;  and  (3)  ntnhjhont  -  the  site 


between  the  ntnh  and  hont  genes  placing  a  ?7ont// following 
a  hontjal  ntnh  gene.  These  recombination  events  com¬ 
pound  the  confusion  of  the  taxonomy  of  the  species  and 
make  it  difficult  to  clearly  describe  the  strains  with  the  cur¬ 
rent  nomenclature.  Clearly,  from  the  examples  listed 
above,  the  multiple  recombination  events  have  signifi¬ 
cantly  contributed  to  the  genetic  diversity  observed  in  the 
honts. 

This  study  provides  the  first  molecular  information  to 
explain  the  unusual  observation  of  a  hontje  within  both  C. 
hotulinum  and  C.  hutyricum  type  E  strains.  By  examining 
the  hontje  location  within  the  two  species,  an  insertion 
event  was  identified  which  targeted  the  same  rarA  gene. 
The  rarA  is  a  transposon-associated  gene  with  recombi- 
nase  activity  that  could  explain  the  precise  excision  and 
integration  of  the  hontje  in  the  two  species.  Interestingly, 
the  comparison  of  sequences  of  the  spliced  and  intact  rarA 
genes  revealed  that  this  insertion  event  occurred  sepa¬ 
rately  in  the  two  species,  yet  the  inserted  region  contain¬ 
ing  the  hontje  gene  was  from  a  common  source. 

Other  transposon-associated  proteins  were  identified 
downstream  from  the  hontjnp  h  where  an  IS  element, 
resolvase  and  site-specific  recombinase  are  located. 
Unfortunately,  Gram  positive  transposons  are  not  well 
characterized  and  elude  detection  because  they  lack  per¬ 
fect  inverted  repeats  flanking  the  transposed  region  or  are 
not  replicated  in  the  process  [26].  Although  specific  trans¬ 
posons  were  not  identified  near  the  toxin  complex  genes, 
transposon-associated  proteins  were  found.  The  identifi¬ 
cation  of  these  proteins,  the  presence  of  the  toxin  complex 
in  different  host  backgrounds,  its  location  within  the 
chromosome  as  often  as  within  plasmids  and  the  identifi¬ 
cation  of  specific  targeted  insertion  sites  in  the  same  or 
different  species  implicate  transposon  activity  as  at  least 
one  mechanism  for  hont  movement. 

The  genomic  analyses  also  discovered  the  location  of  two 
hont  genes  within  plasmids,  the  hontjnp  h  in  the  Eklund 
17B  strain  and  the  hontjhv  h  and  hontjf  within  the  Bf  strain. 
The  hontjnp  li-containing  plasmid  could  have  been  hori¬ 
zontally  transferred  to  a  Group  II  bacterial  background,  or 
it  could  have  been  the  result  of  a  transposon-mediated 
insertion  into  a  unique  plasmid.  Likewise  the  hontjhv  h 
and  hontjf  location  within  a  plasmid  homologous  to  the 
bivalent  pGLJ  with  the  hontja4  replaced  with  hontjf  shows 
that  the  two  sequenced  bivalent  strains  contain  honts  in 
similar  locations  and  that  the  honts  are  distant  to  each 
other.  It  is  interesting  that,  within  the  two  sequenced  biva¬ 
lent  strains,  the  honts  are  within  either  an  ha  cluster  or  an 
orfX  cluster.  These  different  clusters  could  provide  differ¬ 
ing  protection  or  expression  of  the  hont. 

The  finding  that  the  ntnh  gene  has  recombined  to  place 
the  hontjal  within  the  ha  cluster  associated  with  BoNT/B 
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strains  helps  resolve  the  perception  of  the  'normal'  toxin 
cluster  associated  with  hontlal  strains.  The  success  of  the 
ha  toxin  cluster  strains,  as  evidenced  by  their  widespread 
isolation  in  conjunction  with  human  botulism  cases, 
indicates  that  the  ha  components  must  confer  some  cul¬ 
tural  or  toxicity  advantage  that  is  not  yet  clearly  under¬ 
stood. 

Conclusion 

This  study,  which  compares  15  clostridial  genomic 
sequences,  was  undertaken  in  order  to  identify  the  under¬ 
lying  events  that  result  in  the  genetic  diversity  within  the 
C.  hotulinum  species.  As  more  genomic  sequences  become 
available,  additional  clues  to  understanding  this  complex 
species  and  its  many  toxin  types  and  subtypes  will  be 
uncovered.  This  molecular  analysis  provided:  (1)  a  16S  rm 
dendrogram  of  the  clostridial  species  that  included 
recently  sequenced  members;  (2)  synteny  plots  that  visu¬ 
alize  chromosomal  and  plasmid  gene  organization;  (3) 
the  identification  of  common  locations  of  the  bonts  within 
the  chromosome  and  plasmid;  (4)  the  components  of  the 
f7ont-containing  regions  that  identify  common  features: 
(5)  a  description  of  an  insertion  event  mediated  by  a 
transposon-associated  resolvase  placing  hont/e  in  both  C. 
hotulinum  and  C.  hutyricum  type  E  strains;  (6)  plasmid 
analyses  which  show  that  the  honts  within  the  Bf  strain 
and  npBoNT/B  strain  are  located  within  a  plasmid;  and 
(7)  the  identification  and  examples  of  recombination 
within  the  ntnh  gene,  hont  gene  and  the  region  between 
these  two  genes. 

The  findings  illustrate  that  the  honts  within  the  clostridia 
insert,  recombine  and  are  exchanged  both  within  a  species 
and  among  species.  The  presence  of  hont  genes  within  sta¬ 
ble  plasmids  that  are  not  lost  suggests  the  genes  confer 
some  survival  advantage  to  the  host  bacteria.  Whether  the 
hont  gene  is  within  a  plasmid  or  chromosome,  a  single  or 
bivalent  arrangement  or  within  the  orfX  or  ha  toxin  gene 
cluster,  the  toxin  has  been  both  retained  in,  and  spread 
among,  a  variety  of  different  clostridial  species  termed 
Groups.  The  toxin  complex  genes  have  undergone  recom¬ 
bination,  insertion  and  horizontal  gene  transfer  events 
that  have  yielded  many  variations  of  the  hont  gene, 
thereby  producing  the  toxin  serotypes  and  subtypes.  Hor¬ 
izontal  gene  transfer  events  and  genomic  rearrangements 
are  important  mechanisms  for  bacterial  survival  and  evo¬ 
lution.  Within  the  clostridia  these  attributes  have  enabled 
the  hont  genes  to  continue  to  survive  in  different  clostrid¬ 
ial  host  backgrounds  and  environments. 

Methods 

Strains 

Table  1  lists  the  strains  examined  in  this  study.  They  rep¬ 
resent  C.  hotulinum  A,  B,  E  and  F  serotypes  and  subtypes, 
including  two  bivalent  strains  (BoNT/Ba4,  BoNT/BQ,  a 


strain  containing  both  hontjal  and  bont/b  gene  clusters 
where  the  hontjh  gene  is  not  expressed  (BoNT  A1(B))  and 
a  BoNT/E4-expressing  C.  hutyricum;  a  C.  tetani  and  a  C. 
sporogenes  strain  was  included  for  comparison  [8].  Some 
genomic  sequences  were  complete  or  in  several  large  con- 
tigs  and  others  were  whole  genome  shotgun  sequences. 

Genomic  annotation 

Annotation  of  the  assembled  genome  sequence  was  car¬ 
ried  out  with  the  genome  annotation  system  GenDB  [27] 
and  RAST  server  [28].  A  combined  gene  prediction  strat¬ 
egy  was  applied  by  means  of  the  GLIMMER  2.0  system 
and  the  GRITIGA  program  suite  [29]  along  with  post¬ 
processing  by  the  RBSfinder  tool  [30].  tRNA  genes  were 
identified  with  tRNAscan-SE  [31].  The  deduced  proteins 
were  functionally  characterized  by  automated  searches  in 
public  databases,  including  SWISS-PROT  and  TrEMBL 
[32],  Pfam  [33],  TIGRFAM  [34],  InterPro  [35],  and  KEGG 
[36].  Additionally,  SignalP  [37],  helix-turn-helix  [38]  and 
TMHMM  [39]  were  applied.  Finally,  each  gene  was  func¬ 
tionally  classified  by  assigning  clusters  of  orthologous 
groups  (GOG)  number  and  corresponding  GOG  category 
[40]  and  gene  ontology  numbers  [41]. 

Genome  and  plasmid  comparisons 

Homology  searches  were  conducted  at  the  nucleotide  and 
amino  acid  sequence  level  using  BLAST  [42].  In  order  to 
obtain  a  list  of  orthologs  from  bacteroidete  genomes,  a 
Perl  script  that  determines  bidirectional  best  hits  was  writ¬ 
ten;  for  example,  genes  g  and  h  were  considered  orthologs 
if  h  was  the  best  BLASTP  hit  for  g  and  vice  versa.  E  values 
of  lO'i^  were  acceptable.  A  gene  was  considered  strain  spe¬ 
cific  if  it  had  no  hits  with  an  E  value  of  10-^  or  less.  Addi¬ 
tional  genomic  comparisons  and  dotplot  analyses  were 
performed  with  genome  alignment  tools,  such  as 
MUMmer2  [43],  NUGmer  [44]  and  the  web  interface 
Artemis  Gomparison  Tool  (AGT)  http:// 
www.webact.org[45]. 

The  comparison  of  toxin  gene  island  insertion  patterns 
was  identified  using  the  AGT  alignment  program  at  the 
default  settings.  Predicted  toxin  gene  island  insertion  sites 
were  identified  from  sequence  alignments  and  breakpoint 
sites  were  further  manually  curated.  Gene  definition  was 
manually  annotated  by  inspecting  BLASTP  results  and 
sequence  alignments.  The  gene  name  and  locus  ID  were 
assigned  based  on  the  NGBl  Reference  Sequence  file. 
Insertion  sequence  (IS)  elements  were  identified  and  clas¬ 
sified  by  using  the  IS  Finder  database  [46]. 

Plasmid  analysis  of  the  Bf  contigs  was  performed  by  using 
BLASTN  with  the  pGLJ  sequence  and  70  Bf  contigs.  All 
sequences  scoring  above  the  E  value  cutoff  at  le-20  were 
extracted  for  further  comparison  using  the  PROmer  pro¬ 
gram  from  MUMmer  package.  Four  putative  pBf 
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sequences  from  contigs  ABDP01000018.1, 
ABDP01000023.1,  ABDP01000034.1  and  ABDPOlOO 
0069.1  were  aligned  to  pCLJ  sequences.  MUMmerplot 
was  used  to  display  the  four  contigs  (pBf)  that  were 
ordered  according  to  pCLJ  reference  coordinates. 

Dendrograms 

DNA  alignments  were  created  with  a  combination  of 
Sequencer  software  http://www.genecodes.com/,  PAUP 
http://paup.csit.fsu.edu/,  MUSCLE  http:// 

www.drive5.com/muscle/,  CLUSTAL-W  http:// 
www.ebi.ac.uk/clustalw/  and  hand  editing  with  BioEdit 
http://www.mbio.ncsu.edu/BioEdit/bioedit.html  soft¬ 
ware  and  were  gap  stripped  then  analysed  using  PHYLIP 
http://evolution.genetics.washington.edu/phylip.html 
with  dnadist  with  the  F84  model  of  evolution  and  a  tran¬ 
sition  to  transversion  ratio  of  2.0  (default)  and  neighbor 
joining  algorithms.  Dendrograms  were  rendered  with 
FigTree  http://tree.bio.ed.ac.uk/software/figtree/.  Intra- 
and  inter-serotype  BoNT  gene  recombination  was 
explored  with  SimPlot  http://sray.med.som.jhmi.edu/ 
SCRoftware/simplot/  [47 ]  and  BioEdit. 
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